Will AutoML software replace Data Scientists?

Photo by Markus Winkler on Unsplash

In the last years, a lot of automated machine learning pieces of software have been introduced. They can automate some tasks that a Data Scientist has usually to perform manually. They have reached a very remarkable level of complexity and effectiveness. Are they a threat to Data Scientist’s job or are they an opportunity?

What is AutoML?

AutoML is a generic expression to indicate pieces of software that perform Machine Learning tasks automatically. They usually automate the entire pipeline processing like, for example, cleaning, encoding, feature and model selection, and hyperparameters tuning. Such pieces of software can be Python libraries like Auto-Sklearn or software programs like Data Robot.

AutoML pieces of software replace all the boring steps that take more time to a Data Scientist’s work. They actually make all the combinations of the several parameters of a pipeline (e.g. the blank filling values, scaling algorithm, model type, model hyperparameters) and select the best combination that maximizes some performance metrics (like RMSE or Area under the ROC Curve) in k-fold cross-validation using some search algorithm (like Grid or Random Search).

They can really simplify the life of somebody that has to create a model from scratch and sometimes they explore combinations and scenarios that a Data Scientist may not have thought of.

Does it replace a Data Scientist’s work?

Somebody may think that AutoML replaces a Data Scientist’s work and may make this job obsolete in the future. There’s nothing more wrong than this suspicion. Let’s see why.

Data Science is not (only) Machine Learning

A Data Scientist is more than a person that uses Machine Learning models. A Data Scientist analyzes the hidden information inside data, extracts useful correlations, gives help preparing the correct data to feed an ML pipeline, gives useful insights about the business that has created data itself. These things are the most important part of Data Science and cannot be fully automated. They rely on a deep knowledge of the business, on a strong and effective use of the business language that people talk and, more than everything else, that business managers talk.

All these things make the Data Scientist’s job more complicated and interesting than running Machine Learning models and that’s outside AutoML scope.

AutoML software automates Machine Learning tasks, not the whole Data Science process. Machine Learning is just a small part of a Data Scientist’s job and maybe isn’t the most important one nor the most challenging one. Understanding data, information, and business context are the real challenges of a Data Scientist and, if these tasks are not fully accomplished, Machine Learning will never be the magic wand that solves all the problems.

AutoML doesn’t work alone

AutoML is software, so it always needs somebody with the right skills to use it. Infact, AutoML results must be validated by a professional Data Scientist in order to make sure they are correct and make sense in the business environment they have been produced. It’s not unusual to produce a model that seems perfect on paper but in reality, doesn’t produce any useful business insights or, in the worst case, its predictions are trivial. That’s why a Data Scientist must always be there in order to make sure that the model is telling us something new and not just chewing something old.

Is AutoML useful to Data Scientists?

Yes, I think that it’s very useful because it automates all the boring tasks that usually require a lot of code and give a high chance of making some mistake. Without AutoML, a Data Scientist must create his own ML pipeline from scratch. Every ML model has its own requirements (e.g. scaling the features for the neural networks), so the complete set of pipelines to test may become quite complex and time-consuming. Using an AutoML tool will easily make a Data Scientist create a good ML model without caring too much about the code. Remember: a Data Scientist is not a software engineer, so he must write as little code as possible, in order to focus on data and information.

Conclusions

I think that Data Scientists must follow change and innovation, so AutoML can become a very useful friend of theirs if they start using it properly. If they automate boring tasks, they will likely have more time to spend analyzing information, that is the real goal of a Data Scientist.


Originally published on Towards Data Science.